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1.0  Introduction 


This  document  summarizes  work  performed  by  Delta  Information 
Systems,  Inc.  for  the  Office  of  Technology  and  Standards  of  the 
National  Communications  System,  an  organization  of  the  0.  S. 
Government,  under  contract  Number  DCA100-83-C-0047.  The  Office  of 
Technology  and  Standards,  headed  by  National  Communications  System 
Assistant  Manager  Marshall  L.  Cain,  is  responsible  for  the 
management  of  the  Federal  Telecommunications  Standards  Program, 
which  develops  telecommunication  standards  whose  use  is  mandatory 
by  all  Federal  agencies. 

Consideration  is  now  being  given  to  possible  CCITT  standards 
for  Group  4  Facsimile  which  refers  to  the  transmission  of  an  A4 
sized  page  over  data  networks  containing  error  control.  It  is 
likely  that  the  basic  coding  technique  for  Group  4  transmissions 
will  be  some  advanced  form  of  the  Modified  READ  code,  which  is  the 
optional  compression  algorithm  for  Group  3.  The  purpose  of  this 
study  is  to  investigate  the  more  advanced  Mixed  Mode  coding 
technique  which  will  be  one  service  of  Group  4  as  shown  in  Figure 
1-1.  In  a  mixed  mode  system  the  information  printed  on  a  page  is 
divided  into  two  parts  -  symbols  (letters,  numerals,  punctuation, 
etc.)  and  graphics  (logos,  signatures,  sketches,  etc.)  The  purpose 
of  this  study  was  to  examine  possible  techniques  for  segmenting  a 
document  into  graphic  and  symbol  areas,  and  assemble  a  code  that 
represents  the  entire  document. 


CLASSES  OF  GROUP  H  TERMINALS 


Parameters  to  be  considered  include  compression,  commonality  with 


facsimile  and  TELETEX  1/  transmissions,  and  complexity  of 
implementation. 

Four  segmentation  techniques  were  selected  for  analysis.  The 
techniques  were  designed  to  differ  from  each  other  as  much  as 
possible,  so  as  to  display  a  wide  variety  of  characteristics.  For 
each  technique,  many  minor  modifications  would  be  possible,  but  it 
is  not  expected  that  these  modifications  would  alter  the 
conclusions  drawn  from  the  study. 

The  segmentation  techniques  analyzed  are: 

-  SYMBOL  REMOVAL/SCAN  LINE 

-  SYMBOL  REMOVAL/LINE  OF  SYMBOLS 

-  EXTENDED  TELETEX 

-  SYMBOL  REMOVAL/HYBRID 

Section  2  presents  descriptions  of  the  four  mixed-mode 
segmentation  alternatives  considered.  Section  3  describes  the 
assumptions  and  methodology  for  measuring  compression,  and  the 
compression  computations  themselves.  Section  4  discusses  the 
commonality  of  each  alternative  with  Group  3  facsimile.  Group  4 
facsimile,  and  TELETEX.  It  also  summarizes  compression  and 
discusses  the  complexity  of  implementation  of  each  technique. 
Finally  Section  5  compares  the  alternatives  and  draws  conclusions. 

The  CCITT  has  determined  that  the  7  layer  OSI  (Open  System 

1/  TELETEX  refers  to  a  CCITT  recommendation  which  is  now  under 
development  for  communication  between  word  processors. 
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Interconnect)  protocol  which  has  been  developed  by  the  ISO 
(International  Standards  Organization)  will  be  used  for  Group  4 


facsimile.  Figure  1-2  illustrates  the  top  4  OSI  levels 
emphasizing  the  relationship  between  the  Teletex  and  Group  4 
facsimile  services.  The  S.  and  T.  are  the  designations  of  the 
CCITT  Recommendations  for  each  protocol  layer.  Mote  that  S.a  is 
the  key  recommendation  for  mixed  mode  operation.  This  standard 
has  not  yet  been  finalized.  The  most  recent  draft  of  this 
recommendation  is  included  in  Appendix  A  for  reference  purposes. 
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FRAMEWORK  OF  CCITT  RECOMMENDATIONS 
FOR  GROUP  4  FACSIMILE  APPARATUS 

FIGURE  1-2 
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2.0  Task  1^  -  Develop  Candidate  Mixed  Mode  Algorithms 

Four  mixed-mode  segmentation  techniques  are  selected  for 
consideration  in  this  study.  The  techniques  are: 

Symbol  Removal/Scan  Line 

Symbol  Removal/Line  of  Symbols 

Extended  Teletex 

Symbol  Removal/Hybrid 

In  the  three  symbol  removal  techniques,  the  black  pels 
associated  with  recognized  symbols  are  coded  and  "removed" 
(changed  to  white) ,  and  then  the  entire  document  is  encoded  using 
the  Modified  READ  code,  including  areas  where  the  symbols  were. 
In  the  Extended  TELETEX  technique,  the  Modified  READ  code  is  used 
only  for  areas  that  do  not  have  encoded  characters.  All 
techniques  presume  existence  of  a  stored  library  of  symbols. 

2.1  Symbol  Removal/Scan  Line 

This  coding  technique  is  very  similar  to  the  Combined  Symbol 
Matching  algorithm  which  is  described  in  Appendix  B.  In  this 
approach  the  document  is  scanned,  from  top  to  bottom,  and  from 
left  to  right,  until  a  group  of  black  pels  is  encountered  that 
matches  a  symbol  in  the  stored  library.  All  black  pels  within  the 
rectangular  symbol  space  are  then  changed  to  white,  and  the  symbol 
code  and  position  are  recorded.  After  the  symbols  have  been 
"removed",  the  documnent  is  rescanned  in  principle  and  encoded 


using  the  Modified  READ  code  (k>(D,  no  EOL  code).  The  detected 
symbol  codes  are  inserted  before  the  READ  code  of  the  scan  line  in 
which  the  top  of  the  symbol  occurs.  The  presence  of  a  symbol  code 
rather  than  a  READ  code,  is  indicated  by  a  single  bit  at  the 
beginning  of  every  scan  line.  If  the  bit  indicates  that  there  are 
symbols  on  the  scan  line,  the  8-bit  symbol  code  follows,  and  this 
in  turn  is  followed  by  an  11-bit  horizontal  position  code  word, 
(211«2,048,  being  greater  than  the  1,728  pels  in  the  scan  line). 
This  may  be  followed  by  additional  symbol/horizontal-position  code 
pairs  for  any  other  symbols  that  may  have  been  detected  on  the 
scan  line  (in  order  of  horizontal  position).  Finally,  the  symbol 
data  is  terminated  by  a  special  8-bit  symbol  code  that  indicates 
there  are  no  more  symbols  on  the  scan  line.  Then  the  READ  code 
for  that  line  is  transmitted. 

Notice  that  in  this  technique  the  recognized  symbols  will  be 
encoded  as  they  are  first  encountered  by  the  scanning  process, 
regardless  of  where  they  appear  relative  to  other  symbols  or 
graphics.  The  vertical  position  of  the  symbols  is  implied  by  the 
scan  line  on  which  the  symbol  code  appears. 

2.2  Symbol  Removal/Line  of  Symbols 

In  this  technique,  as  in  other  symbol  removal  approaches,  the 
symbols  are  detected,  "removed",  and  their  codes  and  positions  are 
recorded.  The  symbols  are  then  organized  into  lines  of  symbols, 
based  on  symbol  position,  height,  hang  down,  etc.  Account  is 
taken  of  small  amounts  of  line  skew,  and  a  single  vertical 
position  is  assigned  to  the  entire  line  of  symbols.  When  this 


process  is  complete,  each  printed  line  on  the  document  should  be 


contained  within  a  line  of  symbols.  Spaces  between  symbols  having 
several  different  widths  up  to  about  2  normal  symbol  spaces,  are 
filled  with  appropriate  blank  characters.  If  the  space  between 
symbols  is  greater  than  2  symbol  spaces,  the  line  of  symbols  is 
broken  into  segments. 

The  entire  document,  less  recognized  symbols,  is  transmitted 
using  Modified  READ  code.  When  a  scan  line  having  the  vertical 

position  of  a  line  of  recognized  symbols  is  encountered,  a  special 

>» 

12-bit  code  (which  could  be  an  EOL  code)  is  inserted.  This 
changes  the  mode  from  graphics  to  symbols.  This  is  followed  by  an 
11-bit  code  giving  the  horizontal  position  of  the  first  symbol. 
Then  the  symbol  codes  for  each  symbol  in  the  segment  are  sent, 
followed  by  a  special  8-bit  end-of-segment  symbol  code.  This,  in 
turn,  is  followed  by  an  11-bit  distance-to-the-next-  segment 
symbol  code.  The  last  segment  of  symbols  on  the  line  is  followed 
by  a  special  8-bit  end-of-line  symbol  code  instead  of  the 
end-of-segment  code.  This  changes  the  mode  back  to  graphics,  and 
Modified  READ  code  is  continued  until  another  scan  line  with  a 
line  of  symbols  is  encountered. 

As  with  the  other  symbol  removal  techniques,  a  recognized 
symbol  will  be  encoded  wherever  it  is  located,  since  lines  of 
symbols  may  overlap  vertically,  and  each  line  of  symbols  may 
contain  as  few  as  one  symbol.  There  may  be  some  inaccuracies  in 
positioning  symbols,  since  spaces  between  symbols  of  1  or  2  pels 


will  probably  not  be  encoded,  and  the  horizontal  position  of  a 


symbol  code  could  be  in  error  at  the  end  of  a  long  line  of 
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symbols 


2.3  Extended  TELETEX  -  CR/LF  option 

In  this  approach  the  entire  document  is  divided  into 
character  spaces ,  except  for  areas  that  are  defined  as  graphics, 
as  discussed  below.  All  character  symbols,  including  blanks,  are 
transmitted  using  8-bit  symbol  codes. 

The  graphics  are  transmitted  by  Modified  READ  code  as  they 
occur  within  a  line  of  symbols.  First,  a  special  8-bit  symbol 
code  is  used  to  designate  the  transition  from  symbol  codes  to 
graphics.  This  is  followed  by  an  11-bit  code  giving  the  width  of 
the  graphics  area.  (The  height  of  the  graphics  area  is  defined  by 
the  height  of  the  symbol  font.)  Then  the  READ  code  for  the 
graphics  is  sent.  The  length  of  the  READ  code  is  defined  by  the 
width  and  height  of  the  graphics  area,  so  the  transition  back  to 
symbol  codes  does  not  require  a  code. 

In  the  CR/LF  option,  instead  of  transmitting  a  series  of 
blank  symbol  codes  at  the  right  of  the  line,  a  special  8-bit  code 
is  used  to  designate  the  last-symbol-on-the-line.  This,  of 
course,  would  have  to  be  to  the  right  of  any  graphics  on  the  line. 
This  last-symbol-on-the-line  code  would  direct  the  receiver  to 
start  on  the  next  line  of  symbols,  and  would  replace  the  CR  and  LF 
codes  of  TELETEX.  For  reasons  of  commonality  it  may  be  preferable 
to  keep  the  two  standard  TELETEX  symbols  for  this  purpose. 

This  technique  is  considered  primarily  as  a  method  to 
incorporate  graphics  (such  as  logos  and  signatures) ,  into 


1 


l 


computer-generated  text.  Therefore  graphics  areas  are  defined, 
probably  by  the  user,  as  rectangular  areas  which  may  contain  a 
mixture  of  graphics  and  symbols. 

Since  all  lines  of  symbols  must  have  proper  spacing,  symbols 
that  are  not  aligned  with  the  majority  of  the  symbols  must  be 
treated  as  graphics. 

2.4  Symbol  Removal/Hybrid 

This  technique  combines  features  of  the  other  two  symbol 
removal  techniques  to  make  it  more  robust  then  either  in  that  it 
is  designed  to  handle  both  isolated  (or  arbitrarily  located 
symbols)  and  symbol  strings  in  lines  or  segments. 

In  this  technique,  as  in  other  symbol  removal  approaches,  the 
symbols  are  detected,  "removed",  and  their  codes  and  positions  are 
recorded.  Spaces  between  symbols  (up  to  2)  are  filled  with 
appropriate  blank  characters. 

The  presence  or  absence  of  a  symbol  code,  rather  than  a  READ 
code,  is  indicated  by  a  single  bit  at  the  beginning  of  every  scan 
line.  In  addition,  a  single  bit  preceding  each  symbol  code 
indicates  whether  the  symbol  is  contiguous  or  not,  i.e.,  not 
followed  by  more  than  2  blank  spaces.  If  the  symbol  is  not 
contiguous,  it  is  preceded  by  a  horizontal  position  code  otherwise 
the  symbol  code  follows  immediately. 

A  special  8-bit  symbol  code  terminates  the  symbol  string  at 
the  end  of  the  line  of  symbols.  Then  the  READ  code  for  that  line 
is  transmitted. 
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3.0  Task  2  -  Measurement  of  Compression 
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3.1  Methodology  for  Measuring  Compression 


For  each  of  four  proposed  mixed  mode  techniques,  an  estimate 
of  compression  has  been  made.  Ext’imates  of  compression  were  made 
for  CCITT  test  Document  1  and  for  two  computer  generated 
documents . 


It  should  be  recognized  that  compression  values  calculated  in 
this  report  are  estimates  only,  and  should  not  be  regarded  as 
actual  measured  numbers.  However,  it  is  expected  that  the 
relative  compressions  of  the  various  segmentation  techniques  are 
accurate,  since  the  same  assumptions  were  used  for  all  of  them. 


3.2  Assumptions 


In  making  compression  estimates,  the  following  assumptions 
were  made: 


(1)  Each  symbol  is  encoded  using  8  bits,  which  allows 
up  to  256  different  symbols. 


(2)  Several  of  the  256  symbol  codes  can  be  made 
available  for  indicating  termination  of  symbol 
transmission,  or  other  requirements  of  the  segmentation 
technique  employed. 


$ 


,  i’r, 


(3)  A  stored  library,  suitable  for  the  document  being 
transmitted,  is  available  at  both  sending  and  receiving 
terminals. 

(4)  Bits  require ^  to  identify  the  proper  symbol  library 
to  the  receiving  terminal  are  neglected. 

(5)  The  stored  library  will  accommodate  either  fixed  or 
proportionally  spaced  fonts,  including  several  widths 
for  word  spaces. 

(6)  All  characters  of  the  principal  font  used  in  the 
document  are  in  fact  recognized  as  such,  and  will  be 
encoded  as  symbols,  subject  to  the  rules  of  the 
proposed  technique. 

(7)  Lines  of  symbols  can  be  accommodated  despite  slight 
skews  of  the  printed  lines. 

(8)  The  characters  of  the  principal  font  include  math 
symbols,  italics,  and  Greek  letters,  but  not  subscripts 
or  superscripts,  or  long  horizontal  or  vertical  lines. 

(9)  Graphic  data  is  transmitted  using  the  modified  READ 
code,  without  EOL's  and  with  k*CD. 

(10)  The  number  of  bits  required  to  transmit  increased 
width  of  white  spaces  by  means  of  Modified  READ  can  be 
neglected.  This  follows  because  the  spacing  between 
groups  of  black  pels  (such  as  symbols)  usually  only  has 


to  be  specified  once,  and  the  READ  code  length  does  not 
grow  rapidly  with  the  length  of  a  white  run. 

(11)  Each  A4  source  document  normally  216  x  297MM, 
consists  of  2,376  rows  with  1,728  pels  per  row  (ie, 
resolution  equals  8  pels  per  mm,  or  approximately  200 
pels  per  inch) . 

(12)  Scanned  documents,  stored  on  tape,  are  retrieved 
onto  disk  as  an  image  file  commensurate  with  a  16-bit 
computer  word  size.  Thus  the  computer  image  file 
consists  of  2,336  rows  of  1,728  bits  each  row.  Since 
the  computer  image  contains  all  the  black  pels  of  the 
original  document,  any  process,  such  as  the  modified 
READ  algorithm,  performed  on  the  computer  image  may 
represent  the  same  process  performed  on  the  original 
document  with  negligible  error  or  vice  versa. 

(13)  Code  transmissions  will  not  experience  any 
transmission  errors,  so  addition  of  redundancy  for 
error  control  is  not  required. 

3.3  Calculating  Compression 

For  each  technique  the  number  of  bits  required  to  construct 
the  message  is  totaled.  This  includes  any  flags,  tag  bits,  symbol 
codes,  end  of  symbols  on  line,  end  of  segment  graphics,  symbol 
mode  changes  and  horizontal  and  vertical  positions.  The  number  of 
bits  required  for  each  of  these  functions  is  given  in  Table  3-1: 


Table  3-1 


Table  o£  Bit  Requirement  for  Data  Function 
Data  Function  Data  Requirement  (Bits) 


Scan  Line  Flag  1 

Contiguous  Symbol  Flag  1 

Symbol  Code  8 

End  of  Symbols  on  Line  or  Segment  8 

Symbol  to  Graphics 

Vertical  Position-Symbols  on  Line  12 

Graphics  to  Symbols 

Horizontal  Position  11 


The  compression  is  calculated  by  dividing  the  total  mesage  bits 
into  the  total  number  of  pels,  as  referred  to  the  source  document, 
which  is  always  2,376  z  1,728  -  4,105,728. 

In  the  three  symbol  removal  techniques,  the  black  pels 
associated  with  recognized  symbols  are  coded  and  "removed" 
(changed  to  white) ,  and  then  the  entire  residual  document  is 
encoded  using  the  Modified  READ  code  (k-OD ,  no  EOL  code), 
including  the  areas  formerly  occupied  by  the  "removed"  symbols. 
In  the  Eztended  TELETEX  technique,  the  Modified  READ  code  is  used 
only  for  areas  that  do  not  have  encoded  characters. 

Values  for  document  related  parameters  used  in  calculating 
compression  estimates  are  given  in  Table  3-2. 
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Table  of  Document  Relfeted  Parameters 
used  for  Compression  Estimates 


3.4  Scanned  Document  -  CCITT  No.  1,  Figure  3-1 


3.4.1  Symbol  Removal/Scan  Line 

Figure  3-2  illustrates  the  composition  of  a  mixed-mode 
message  using  this  technique.  As  indicated  in  Figure  3-3,  all  of 
the  typewritten  symbols  are  recognized,  encoded  and  "removed". 
The  presence  or  absence  of  a  symbol  code,  rather  than  a  READ  code, 
is  indicated  by  a  single  bit  at  the  beginning  of  every  scan  line. 
A  special  8  bit  symbol  code  terminates  the  symbols  on  the  scan 
line  and  indicates  the  end  of  symbols  and  start  of  graphics.  Each 
symbol's  horizontal  position  is  independently  encoded.  Since  the 
vertical  position  is  implied  by  the  scan  line  on  which  the  symbol 
code  appears,  no  account  need  be  taken  of  spaces  between  words  or 
between  line  segments.  The  Modified  READ  code  (k*Qt> ,  no  EOL  code) 
is  applied  to  the  residue  (Figure  3-4)  after  removal  of 
typewritten  symbols.  A  summary  of  the  compression  estimate  for 
the  Symbol  Removal/Scan  Line  Technique  applied  to  the  CCITT-1 
document  is  presented  in  Table  3-3. 

3.4.2  Symbol  Removal/Line  of  Symbols 

Figure  3-5  illustrates  the  composition  of  a  mixed-mode 
message  using  this  technique.  As  indicated  in  Figure  3-6  all 
typewritten  symbols  are  recognized,  encoded  and  "removed”.  The 
symbols  are  organized  into  lines  of  symbols  with  spaces  between 
symbols  (up  to  two)  filled  by  blank  characters.  The  resulting 
string  of  symbol  codes  is  preceded  by  a  graphics-to-syrobols  code, 
a  borisontal-  position-of  -the-f irst-symbol  code  and  is  terminated 
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THE  SLEREXE  COMPANY  LIMITED 

UKU  LANK  ■  KOOLB  ■  DORSET  ■  BH23  tBR 
IUM  MU  (MS  13)  51*17  ■  TUX  12343* 


Our  bf.  330/f  JC/EAC 


ISth  January,  1972. 


'a 


Or.  P.K.  Cuedall, 
Mining  Surveys  Led. , 
tolroyd  toad, 
loading, 
larks. 


Daar  toe*. 


$ 
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Panic  as  to  introduce  you  to  eba  facility  of  f  oca  calls 


In  facsiailo  a  photocell  i*  caused  to  parfora  a  raatar  scan  over 
tha  aaOjact  copy.  Tha  variations  of  print  dons  icy  on  tha  anoint 
causa  tko  pbetocoll  to  goaorata  an  analogous  aloctrical  video  signal. 
This  signal  is  uoad  to  aodulat*  a  carrier,  uhich  la  traaandttad  to  a 
r oases  destination  osar  a  radio  or  cable  comaxi cations  link. 

At  eba  reuses  camiacl,  daaodulatioa  nconotnact*  tba  video 
sipul,  ubicb  is  used  to  aodulat*  tba  density  of  print  produced  by  a 
printing  daviea.  Ibis  device  is  spanning  in  a  raatar  scan  synchronised 
with  that  at  tha  tranced ttiag  taraiaal.  to  a  rooult,  a  facoiaila 
copy  of  tha  subject  dociansnt  is  produced. 


Probably  you  bauo  usos  for  this  facility  in  your  organisation. 

Tours  sincoraly , 


nu. 


P.J.  CROSS 

Croup  Leadar  -  Pocsiadla 


Figure  3-1 

CCITT  Document  Number  1 


Symbol  Removal/Scan  Line 
CCITT  Document  Number  1 
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Figure  3-4 

Residue  after  removal  of  symbols 
CCITT  Document  Number  1 
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Symbol  Removalt/Scan  Line 


Figure  3-5 


MESSAGE  COMPOSITION 


SYMBOL  REMOVAL/LINE  OF  SYMBOLS 
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graphics  mode  using  Modified  READ  code  -  variable  bits 
symbol  code  -  8  bits 

indicates  change  from  graphics  to  symbols  -  12  bits 
horizontal  position  of  first  symbol  in  line  -  11  bits 
end  of  symbols  cn  line  -  8  bits 
end  of  symbols  on  segment  -  8  bits 
distance  between  segments  -  11  bits 
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Figure  3-6 

Symbol  Removal/Line  of  Symbols 
CCXTT  Document  Number  1 
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with  an  end-of-  symbols -on- line  code  .  More  than  two  spaces 
between  symbols  on  a  line  breaks  the  line  into  segments  which 
except  for  the  last  segment  are  each  followed  by  an 
end-of-symbols-on-segment  code  and  a  distance-  between-segments 
code.  The  last  segment  in  a  line  is  terminated  with  an 
end-of-symbols-on-line  code.  The  Modified  READ  code  (k*<p,  no  EOL 
code)  is  applied  to  the  residue  Figure  3-4.  A  summary  of  the 
compression  estimate  for  the  Symbol  Removal/Line  of  Symbols 
Technique  applied  to  the  CCITT-1  document  is  presented  in  Table 
3-4. 

3.4.3  Extended  TELETEX  -  CR/LF  Option 

Figure  3-7  illustrates  the  composition  of  a  mixed-mode 
message  using  this  technique.  Figure  3-8  shows  symbols  encoded  by 
the  Extended  TELETEX  technique  (with  transmission  of  CR/LF  symbol) 
and  also  shows  boxed-in  areas  of  Graphics  transmitted  by  the 
Modified  Read  code  (k*CDr  no  EOL  code) .  All  typewritten  symbols 
are  encoded  as  are  blank  characters  used  to  space  over:  to  the 
first  character  on  each  line  of  text  and  to  the  left  hand 
horizontal  position  of  the  boxed-in  areas  of  Graphics  (see  figures 
3-9  through  3-14).  Symbols-to-Graphics  and  Graphics  Width  codes 
guide  deployment  of  the  Modified  Read  Code.  A  summary  of  the 
compression  estimate  for  the  Extended  TELETEX-CR/LF  Option 
technique  applied  to  the  CCITT-1  document  is  presented  in  Table 
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Extended  TELETEX-CR/LF  Option 
CCITT  Document  Number  1 


12 >28  PM  MON 
PLOT  <LOGO  > 


LINES  READ  - 


,  21  NOV..  1983 

STARTING  AT  PEL  #  1  (APPROX.)  -  RECORD  LENGTH  80 


80. 


Figure  3-9 
Logo  Graphic 
CCITT  Document  Number  1 
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Figure  3-10 
SLEREXE  Graphic 
CCITT  Document  Number  1 
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SAPORS  LANE  Graphic 
CCITT  Document  Number  1 
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Figure  3-13 
Signature  Graphic 
CCITT  Document  Number  1 
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Figure  3-14 


Registration  Graphic 
CCITT  Document  Number  1 


Summary  of  Compression  Estimate 


3.4.4  Symbol  Removal/Hybrid 

Figure  3-15  illustrates  the  composition  of  a  mixed-mode 
message  using  this  technique.  As  shown  in  Figure  3-16  all 
typewritten  symbols,  including  spaces  between  symbols  (up  to  two) , 
are  recognized,  encoded  and  "removed”.  The  presence  or  absence  of 
a  symbol  code,  rather  than  a  READ  code,  is  indicated  by  a  single 
bit  at  the  beginning  of  every  scan  line.  In  addition  a  single  bit 
preceding  each  symbol  code  indicates  whether  the  symbol  is 
contiguous  or  not.  If  the  symbol  is  not  contiguous  it  is  preceded 
by  a  horizontal  position  code  otherwise  the  symbol  code  follows 
immediately.  A  special  8  bit  symbol  code  terminates  the  symbol 
string  at  the  end  of  the  line  of  symbols.  The  Modified  READ  code 
(k*flO,  no  EOL  code)  is  applied  to  the  residue  Figure  3-4.  A 
summary  of  the  compression  estimate  for  the  Symbol  Removal/Hybrid 
technique  applied  to  the  CCITT-1  document  is  presented  in  Table 
3-6. 

3.5  Computer  Generated  Documents  -  A  and  B 

Computer  generated  documents  A  and  B  are  shown  respectively 
in  Figures  3-17  and  3-18.  Compression  estimates  for  the  computer 
generated  documents  are  calculated  in  the  same  fashion  as 
previously  described  in  Section  3.4  for  the  CCITT  document  number 
1.  In  the  Symbol  Removal/Scan  Line  Technique  symbols  are  removed 
as  illustrated:  Document  A,  Figure  3-19;  Document  B,  Figure  3-21. 
After  symbol  removal  READ  Code  is  applied  to  the  residue:  Document 
A,  Figure  3-20;  Document  B,  Figure  3-22. 


3-25 


Scan  Sym 
Line  Pres 


Figure  3-15 
Message  Composition 
Symbol  Removal/Hybrid 


HPOS  S 


S  1  EOS  G 


HPOS  S 


HPOS  S 


HPOS  S 


Legend 
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Figure  3-16 
Symbol  Removal/Hybrid 
CCITT  Document  Number  1 


Income  is  an  inflow  of  assets,  but  it  must  be  recognized 
that  there  are  inflows  of  assets  which  are  not  income.  Obviously 
an  inflow  of  capital  funds  from  stockholders  is  not  income  to  a 
corporation,  nor  should  a  business  regard  as  income  an  inflow  of 
assets  which  is  offset  by  an  increase  in  liabilities.  Income 
consists  of  an  inflow  of  assets  in  the  form  of  cash  receivables, 
or  other  property  from  customers  and  clients,  and  is  related  to 


GRAPHICS  OUTPUT 


the  disposal  of  goods  and  the  rendering  of  services.  If  income 
is  earned  by  selling  goods,  it  may  also  be  called  profit;  the 
term  profit  is  not  properly  applied  to  income  derived  from  the 
rendering  of  services. 

A  basic  criterion  for  the  determination  of  the  period  in 
which  income  may  be  regarded  as  earned  may  be  stated  as  follows: 
Income  should  not  be  regarded  as  earned  until  an  asset  increment 
has  been  realized,  or  until  its  realization  is  reasonably  assured 

Page  1 

Figure  3-17 

Computer  Generated  Document  A 


FIGURE  2  EXTERIOR  VIEW  OF  PROPOSED  BUILDING 


We  ace  running  out  of  oil  -  and  natural  gas.  Whether  it's 
exactly  30  years  or  more  makes  very  little  difference  in  the  long 
run.  As  we  begin  to  drill  more  deeply  into  hard-to-reach  reserves, 
the  supply  will  become  more  spotty  and  more  expensive.  So  start 
planning  for  oil-gas  alternatives. 

The  best  is  coal.  It's  conservatively  estimated  that  we  have 
300  years  of  coal  reserves.  However,  the  cost  of  mining  and 
transporting  it  will  grow  sharply  as  demand  builds.  (Much  of  the 
coal  will  be  difficult  to  reach,  too.) 

Should  a  company  convert  its  boilers  to  coal-burning  from 
oil  or  natural  gas-burning?  In  many  cases  the  answer  is  yes. 
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the  disposal  of  goods  and  tne  rendering  of  services.  If  income 
is  earned  by  selling  goods,  it  may  also  be  called  profit;  the~ 
term  profit  is  not  properly  applied  to  income  derived  from  the 
tendering  of  services. 

A  basic  criterion  for  the  determination  of  the  period  in 
which  income  may  be  regarded  as  earned  may  be  stated  as  follows: 
Income  should  not  be  regarded  as  earned  until  an  asset  increment 
has  been  realized,  or  until  its  realization  is  reasonably  assured. 
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Residue  after  removal  of  symbols 
Document  A 


Figure  3-21 

Symbol  Removal/Scan  Line 
Document  B 
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Similarly  in  the  Symbol  Removal/Line  of  Symbols  Technique, 
symbols  are  removed  as  illustrated:  Document  A,  Figure  3-23; 
Document  B,  Figure  3-24. 

The  Extended  TELETEX-CR/LF  Option  Technique  is  applied  as 
illustrated:  Document  A,  Figure  3-25;  Document  B,  Figure  3-27. 
READ  code  is  applied  to  the  boxed-in  graphics:  Document  A,  Figure 
3-26;  Document  B,  Figure  3-28  and  3-29. 

In  the  Symbol  Removal/Hybrid  Technique  symbols  are  removed  as 
illustrated;  Document  A,  Figure  3-30;  Document  B,  Figure  3-31. 

The  results  for  the  four  mixed  mode  compression  techniques 
are  presented  in  Tables  3-7  to  3-10  for  computer  generated 


documents  A  and  B 
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Symbol  Removal/Line  of  Symbols 
Document  B 
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Symbol  REmoval/Line  of  Symbols 
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Summary  of  Compression  Estimate 
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4.0  Task  3^  -  Analysis  of  Results 
4.1  Compression 


The  compression  estimates,  calculated  in  Section  3  for  all 
four  segmentation  techniques,  are  summarized  in  Table  4-1.  For 
comparison,  the  compression  for  normal  Group  4  Modified  READ 
(k*OD,  no  EOL)  is  also  included  in  Table  4-1. 

As  expected,  the  results  support  the  general  conclusions  that 
with  increasing  graphics  relative  to  text  content:  the  amount  of 
compression  decreases,  the  distinction  between  the  four  mixed-mode 
techniques  diminishes  as  does  their  advantage  over  straight 
forward  application  of  Modified  READ  (k=® ,  no  EOL). 

While  the  Symbol  Removal/Line  of  Symbols  technique  provides 
the  greatest  compression  of  all,  it  is  not  robust  with  respect  to 
arbitrary  location  of  symbols.  The  Symbol  Removal/Scan  Line  does 
handle  the  case  of  arbitrary  symbol  location  but  does  not  take 
advantage  of  the  occurance  of  lines  of  symbols.  Furthermore  the 
Symbol  Removal/Scan  Line  technique  provides  the  poorest 
compression  performance.  The  Symbol  Removal/Hybrid  technique 
remains  robust  with  respect  to  arbitrary  symbol  location  and  this 
technique  also  takes  advantage  of  the  occurrence  of  symbols  in 
lines  thereby  providing  compression  close  to  that  of  the  Symbol 
Removal/Line  of  Symbols  technique. 


The  Extended  TELETEX  with  CR/LF  option  provides  compression 
intermediate  between  the  Line  of  Symbols  and  Hybrid  Symbol  Removal 


Compression  Technique  Scanned  Document  Computer  Generated  Document 


Table 


techniques,  being  marginally  better  than  Symbol  Removal/Hybrid  for 
the  more  Graphics  intensive  computer  generated  documents  and 
slightly  inferior  to  Symbol  Removal/Hybrid  for  the  CCITT  No.  1 
document. 

4.2  Complexity  of  Implementation 

Not  much  separates  the  various  techniques  in  complexity.  In 
all  cases  accommodation  of  scanned  documents  requires  symbol 
recognition.  The  three  Symbol  Removal  techniques  have  the  same 
image  storage  requirements  independently  of  whether  the  recognized 
symbols  are  to  be  organized  into  lnes  of  symbols  or  not. 
Generally  the  32  x  1728  =  55,296  bits  required  to  accommodate  the 
larger  fonts  and  hang-down  characters  will  permit  symbol 
recognition,  removal  and  organization  into  lines. 

In  the  Symbol  Removal/Scan  Line  technique  each  recognized 
symbol  is  incorporated  into  the  transmission  as  the  recognition 
occurs.  In  the  Symbol  Removal/Line  of  Symbols  and  the  Symbol 
Removal/Hybrid  techniques  provision  must  be  made  for  calculating 
the  linear  regression  of  horizontal  and  vertical  symbol  positions 
to  identify  line  skew  and  perform  the  vertical  realignment,  of  the 
recognized  symbols  to  a  baseline,  necessary  to  remove  the  skew. 

Extended  TELETEX,  with  CR/LF  option,  is  presented  as  a 


technique  for  generating  a  mixed-mode  message  by  a  computer;  not 
as  a  method  for  scanning  a  document  and  segmenting  it  into 
graphics  and  text. 


4.3  Commonality 

This  section  discusses  the  commonality  or  ability  of  a  Mixed 
Mode  machine  to  transmit  messages  to,  or  receive  messages  from 
such  existing  machines  as: 

(1)  TELETEX 

(2)  Standard  Group  4  FACSIMILE,  without 
mixed-mode  capabilities 

(3)  Group  3  FACSIMILE 

Changes  to  these  machines  are  not  considered  permissible  because 
they  are  already  in  the  field;  rather,  here  are  considered  changes 
to  Group  4  mixed-mode  machines  necessary  to  provide  commonality 
with  existing  machines. 

The  basic  core  of  commonality  between  the  existing  machines 
and  mixed-mode  machines  is  built-in  in  that  all  mixed-mcde 
techniques  considered  use  the  TELETEX  code  and  the  Modified  READ 
II  code  proposed  for  Group  4  FACSIMILE  machines.  The  Group  4  code 
modifies  the  Group  3  code  by: 

(1)  Using  k-o*  instead  of  k=4  for  7.7  lines/mm, 

(2)  Deleting  the  EOL  code  for  each  line, 

(3)  Eliminating  bit  stuffing  to  achieve  minimum  line  time. 

General  compatibility  betwen  Group  4  and  Group  3  machines, 
with  respect  to  the  encoding  differences,  is  expected  to  be  the 
rule  rather  than  the  exception.  Therefore  commonality  with  Group 


3  machines  is  implicit  in  the  discussion  of  commonality  between 
standard  and  mixed-mode  option  Group  4  machines.  No  special 


discussion  of  commonality  with  Group  3  machines  is  necessary. 

For  all  of  the  mixed-mode  techniques  to  achieve  commonality 
it  may  be  necessary  to  inhibit  information  about  the  stored 
library  to  be  used.  Other  than  this  general  requirement,  the 
Symbol  Removal/Line  of  Symbols  technique  requires  merely  the 
inhibiting  of  symbol  recognitions  to  produce  Group  4  transmissions 
while  reception  of  Group  4  transmissions  requires  no  modification 
whatsoever. 

In  addition  to  the  above,  for  Symbol  Removal/Scan  Line  and 
Symbol  Removal/Hybrid,  code  bits  that  change  mode  roust  be  delected 
on  transmission  and  inserted  on  reception.  For  both  techniques 
this  is  a  single  bit  that  precedes  each  scan  line. 

For  Extended  TELETEX,  a  Group  4  transmission  can  be  obtained 
by  inhibiting  all  symbol  recognitions,  including  blanks,  which 
will  force  the  entire  line  to  be  transmitted  as  graphics.  In 
addition,  the  codes  for  the  last  symbol  on  the  line  and  the 
graphics  width  would  have  to  be  deleted.  For  reception,  the  last 
symbol  code  and  a  graphics  width  code  of  1728  would  have  to  be 
added  before  each  scan  line  to  correct  the  Group  4  transmission 
for  compatibility  with  the  mixed-mode  receiver. 

For  Extended  TELETEX,  to  receive  TELETEX  transmissions  no 
change  is  required  except  adding  the  code  that  identifies  the 
stored  library  to  use.  In  transmission,  the  graphics  mode  must  be 
inhibited,  with  space  symbol  codes  being  transmitted  whenever 
material  that  cannot  be  recognized  as  symbols  is  encountered. 
Also  CR/LF  codes  roust  be  inserted  for  each  line. 
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For  all  techniques  except  extended  TELETEX,  in  transmitting, 
the  graphics  mode  must  be  inhibited,  and  a  blank  symbol  used  to 
replace  each  20  pels  of  all-white  or  graphics  pels.  Also  CR/LF 
codes  must  be  inserted  at  the  end  of  each  line  (approximately  33 
scan  lines) .  Corresponding  changes  must  be  made  for  reception, 
namely  adding  coding  for  approximately  33  all-white  scan  lines  for 
each  LF,  and  deleting  the  CR/LF  codes. 

In  addition  for  Symbol  Removal/Line  of  Symbols,  the  12-bit 
(EOL)  code  that  indicates  a  change  from  graphics  to  symbol  mode 
must  be  deleted  on  transmission,  and  added  on  reception. 


5.0  Conclusions  And  Recommendations 

Figure  5-1  summarizes  the  subjective  evaluations  given  to 
each  mixed  mode  technique  for  each  of  the  topics  considered  in  the 
study.  Note  that  there  is  a  slight  preference  for  the  Symbol 
Removal/Line  of  Symbols  technique  relative  to  the  other  three 
algorithms. 

This  study  assumed  that  the  OCR  function  was  performed 
perfectly  for  the  scanned  input  document.  AT&T  has  submitted  a 
recommendation  to  the  CCITT  for  a  mixed  mode  system  describing  a 
specific  approach  to  the  OCR  function.  It  is  recommended  that 
this  algorithm  be  simulated  in  order  to  properly  evaluate  the 
impact  of  a  realistic  OCR  system  on  mixed  mode  performance. 
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-f 


C  A 


0 vs 


S  .  2.  t  S.V  U~vC<_  Ox  II  £x’  Cj^-^L 

£ ,  3>  1  tfe  'wxv.'^Ack.  xxo^<- 


J\N^^  A 


.  C^kS 


1  «JVWvi  Q^s^x  ?ix  V  (f\  I 

/S|  WAJEy.  *  ~±)  Q  <Xw<~x«~X  OwcilxV  cX*W\X  <X^«^  V^T<^<Jl 


^o”*vw««x.trs. . 

ftNW  E>\  C  .  E  ><  Ok*w\.^«  A  o(  eSxQOX^tv^T  k<*^\ 


J)orttntrn4-  Al/*rrS,<r‘>jc  TroJdfef 


ft\r  I  Tn  1  a  ma  I  f  /*  &»( 


l)  Rgneral 

l.l.  Scope 

1.1*1.  Concerning  Che  service  aspects  : 

-  Recommendation  F.200  leys  down  Che  operaclon  provision 
for  Che  auCoasCle  internet  tonal  Teletex  service.  The 
service  requirements  unique  Co  Che  mixed  mode  of 
operaclon  are  described  in  the  annex  C  of  Che  recommen¬ 
dation  F.200. 


—  Recommendation  F...  should  describe  Che  service  requi¬ 
rement  for  G4  fae  simile  service. 


1.1.2.  On  Che  technical  side  : 

1.121.  The  terminal  equlpement  is  defined  by 

—  Recommendation  S. 60  for  Lhe  teletex, 

-  Recommendation  T.a  for  Che  G4  facsimile 


1.122.  Concerning  Che  information  coding  : 

•'  -  ReeommendaClon  S.61  defines  the  character 
repertoire  and  coded  character  set  for  Che 
International  teletex  service, 

•% 

■  -  Recommendation  T.b.  defines  the  coding  scheme 
used  in  C4  facsimile  equipments, 

'  -  ReeommendaClon  S.100  defines  the  coding  scheme 
used  in  videotex  services. 


1.123.  Recommendation  S.62  specifies  the  control  proce¬ 
dure  for  the  teletex  and  C4  facsimile  services. 


Hote  :  the  generalized  session  protocol,  under 

discussion  between  CCITT  and  ISO,  should  be 
also  considered. 


1.124.  Recommendation  S.70  specifies  the  network  indepen¬ 
dent  basic  transport  for  teletex  and  G4  facsimile 
services.  _  . 

Per**’ d*  *  *•  V 

1*1 3  This  Recommendation  S.a  defines  the 

protocol  to  be  used  within  the  Telematic  Services  when  a  docu¬ 
ment  structure  is  required  e.g^for  Mixed  Mode  Teletex  and 
for  Group  4  Facsimile.  C L>c>*y  ) 


S.a  is  embedded  in  a  Framework  of  Recommendations  for  Tele 
matic  Services  ■***•*'*  //i  T'jo'c  4- 


-s- 

sf  .£  f  cs*.  c/a  v*ierr/ct  /  pr/eycr fy/<"S> 

^  $DCejfv*trrf  Ctre/jz/yrc  concept 

C  -fovrtev  s/e/at/s  m  & J 

l *2»U1  For  the  purgose  of  this  Recommendation,  a  document 
Is  an®'N^-  ffr  text  that  Is  interchanged  between 
telematic  terminals. 

/--?U  A  document  can  be  interchanged  for  two  major  purposes. 

It  may  be  interchanged  as  an  original  in  a  final  form  allow¬ 
ing  for  printing,  displaying  and  iltgg  at  the  recipient. 

-  It  may  be  interchanged  '  in  a  revisable 

form  allowing  for  processing  at  the  recipient. 

Processing  includes  editing,  reformatting,  filing  and 
other  manipulations. 

1.2, f. 3  Text  is  information  for  human  comprehension  that  can  be 

presented  in  a  two-dimensional  form,  e.g.  printed  on  paper 
or  displayed  on  a  screen. 

-7 .J  .Ij  Text  consists  of  graphic  elements  such  as  character  box 

elements,  geometric  elements  and  photographic  elements, 
which  constitute  the  content  of  a  document. 


1.2  •f-£ 


Con+C.rtJ 

The  Smut  of  a  document  need  <x*  separated  Into 
various  portions  In  order  to  : 

Objects 

-  delimit  ahe  presentation  satts  (a.p.  pagasj  etc  pajes 

—  We/'W*  logical  Otojef L,  Socfr  ca  pcivajyapAy £ 

~  ««*  different  types  of  coding  J 

“  allow  processing  after  communication. 

7 /c.  <1osc r, (*>■/,’ r  y,  of  7AeSr  (oor/ons  »£  /ext  erne/  fAcjtr 

Ytl&J iOuy  shjf*~  -//?  c/ocxjc+,  c  ^  /  err'et,  f  /er/urC 

7/>r  c/cri  mr,,  /  ,  rrr/orf  ycecj  Hites  -7\mO~ 

SfrceJ  <.rr  ^ 

J,r.  j(t j  cut  j/yoc/ur £ 

*  / r  <  ir  so  f  .  - .'V,/ r/,  t/-p 


£*W|  Of  J 

_  a  document 

fn*  Jh&if  positioning  and  rendition  on  the  presentation 
media. 


gory}  CL 


/o o</>  Ons 

The  logical  structure  relates  the  content  of  a  doiuiiUJlTL 
to  logical  text  objects  6uoh.  ns  pnrt.4-ons /serving 
specific  purposes;  sections,  headings,  paragraphs, 
footnotes  and  figures. 


The  architecture  that  is  particular  to  a  given  document  is 
called  specific  document  architecture. 


1. 


( [ttjeuf-  obj'r  fi  Comma  r \\ 

2  't*J  A  docent  cn  f  ryiQg  Co^Qm,  pi  C/-/efinOr./V7on  /Ctj  /  per^ifinz 

re pf? s'l  -1  n>y  j  forms  q/c.  u/t>/ch  hoag  appear  several 

fmCs  in  /Ac.  c/ocoentuf. 

.  .  i.  nvsfionS  a'-'"  tic'ir  / cf/rfic e/"fo  arc 

Tine  hrcc/eficrcf  P 

L,  tk  J'J2Sjk  (•$*>+ 


On  a  Srft/ar  ima»»eir  MerC  'rnaJ  predefine^  /ogicaf  objee /r 
m,  J  s/ruc/or,^  safes  *>b»ch  cons*' Zoic  Me  jjehad'c  /oji'cgf 
c/vuc/ort 


The  interchanged  generic  document  structures  help 

-  to  improve  the  transmission  efficiency, 

-  to  maintain  the  consistency  of  the  document  with  the 
document  class  during  revision  at  the  recipient  and 

-  to  facilitate  the  creation  of  new  documents  of  a  certain 
class . 


fhercfovc  e*  CO<mfjrcievisirC.  JcSCV'p+tOrx  of  4  document 
Compri  se  Sped' fi<  a  *>d  jenrn  'c  r/Yucf e/*M  dj  sioua 


/  n 


figure  2 


Zh>  y>u  ease  Me  /ojira!  cMrucfore  M  OSCoi  i^rrc  arc  qssocia ird 

latjeu-b  e/irerh^cs  <slfow>ny  for  Me  ran//#/  of  forma  f/rn^ 

0  r  reform  «  0ihy . 


fj  ft/  use  of  MtffcrCr,'/  sf  ?/-t  Me  Cl  >>U.  / 

.■  rfx.'jcr  fyrp  ^/ffcvCtti  .'n/C'cfieny'  fr^.netiz  '0‘,  or  'M.  .  n 

Two  t^^joe  /jf*^  of  forma/s  c/r:  ru.gt,ts~'  cf 

-  /c\l  frrytrCje  fo^rl#  j 

—  '/fW  firorr'sdh/c  fon.a/  . 


A- 6 


I 

•  *4 


photographic  elements 


L'V' 


Figure  2.  Document  Architecture 


1*2.2.  Communication  concepts. 


doepmri-f  tn(ryri'0+g  ( 

1.221.  The  capabilities  of  the  ^rnipntimt ton  protocol 
Heflnlng  rnrh  a-  Doewmint  Dum-tuie/are  negociated 
in  the  session  establishment  phase. 

The  terminal  capabilities  required  in  order  to 
receive  the  document  are  globally  indicated  by  the 
"Pi'i-au neat Jjliu  bevel  Protocol"  parameter. 

1.222.  The  necessary  elements  describing  the  document 
/Structure^ are  defined  In  this  Recommendation  S.a. 

and  will  be  carried  inside  the  the  Session  Service 
Data  Units  (CSUI-CDUI  S.62  commands). 


J 
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Q,  Fu/kct\awi  (i>r  kU  ^  cu  cV^y*. 


Z>^<n*V  Sk'r'usLivr  €j  o\>j*  C^S  04.1  fi4( ri  J 

TU  Uyo\~t  4<rru 1 1" fel-t_  $Q€c<L o 
(j.ycu\-  ^tr^cVot«-e_  *»-*,  0fk<cn~6^ ,  kU.  qoa—*^  >  o 

tt-yCnA  s(nructur«.. 

S^p€e^^<  <_  Uij  Ot^V"  j(nr  </*<-{«<-«- C- 

TU  S^Ac  Ci  *  ^  Skf^i^r^ 

A  >/*  *  <J*/6l  XU/Wia^r-  </j  L  i^r  ^r-ciiC^  ThJ f. 

Ko*fi.i  *«-<-  UA*-*<k  Spe^l/i  C.  Uyi+sA 

7  U  t^U^-bfiw-  ^ 

J  Cw  (^-O  C.  vMw*^4w  « 

TU-  ^iSp  A-iU  0  fcli.  .  T7  #  lAijCrVv'r' 

J  ♦  cti  lytXo^i  /j 

p^fjeS.  TLft.  UnU  UCa  ,_j  (:LaJ-  1/  f>  *■£  «-  <r->-e_ 

t'xS'V  *£-»  -i-|  kUiM-  U  #v>?  CruY  Sbr^'i :!**-<_ 

beU^  UxT-  o'|  ^  ^  cru<_ 

Or  uo^  ^jtU  *J  ^rzvu-  e-  a, _ Jt  ^  /4  t  ^-«- 

^4'WtJi^  •U'J-cf  ,  < 


(  Ls.  IM-iV"  ^  <• 9}  ck \  Ol^  .  *L  ^  <- 

(_  (■  fL— l  fr~-  6-rCi<  ^n1^"  i«-t-l  'iJVi  fri)  ^  4- 


i  ^ 


p*r*^  *;*<*-  fok*‘'4£t  bou-‘-;^j  0>r 

4o  eU  *-«-  £  ^  c^  ‘j  !“>•'  ^U)  • 

PcjVr  Silt  S^J/  *-*-  ^<SV 


u_ 


t  L .'  5  Cfi 


A  f 


<-  kc 


fr  £ — ‘$r^’  ^~r  u  tr  *-  ■»'  k.  -A*-  Wr>  V^ —  *»  p  <•  ^-2-- 


QX~  l/v'OfLiu^  ^  ^t~AA—<L~  t/^  4.  /- 1  *-~r£\-  c  l — >  c_  t-L^sj  l~t^L*-r 

VvoTtU  Cti  S»‘/xj  p C-rtMiX.  to  fr/-*_  !»/£*  *j  frU.  , 

A.  I^C^cU  ti  «-€  c  C»wfr#c.  (V  VK^'fe^’4-  *-  ^ t'OA^  ^ 

lv'^~  j  i  *Ccj  p (h<~*-i.  L/ix-P.  (to  tX**_  fi*Cco  bL*—  (p*^- 


J.  |  tL«^-«.  Ci  t*yo^‘V  J  frru  c  (o&tokj  tL*-  ^Ji~r*^- 

Ci  O’Ljl  /dfi->r*X  <^j 

,L  ^  Tic  L;^L-+r>±  J^t-rcL  rj,  ^Tl Hx  vi  km><  J- 

Cl»~4>^C  0~r-*~  *-  * — -''-  A— 4c  £-> b 


P^ASl  i^-ocUs  &*•  ^  A^  j  r  ^4-  ^  J  ^  ^  <v*»  y  f  L>  ttouj  t  Zc 

^  ^r-t  A»~c  pos  rt  \>>v-».  X  r'c£^4iVcAp  £0 

kU,  k«^tir  L'^Ltc  Lt~cX  oj  ^TMUC. 


Z TZ  <?3 06© »  -3 


02  <5-  vuto 


ru  C©U-tr*-&4~\r 


c  * 


t\/^C  uv 


(crTr-e.*?  crv.  ^  ^Ur  fea  or  to  ^  b^cUs, 

fictL  S  ucL  CAwVub  e^  ^UC  tX&^-e-Jr) 

t.  C<^  ^  cL  <?-r‘.lt±-  (oo*  JlAji* e-4ri  OtT" 

1 1^(5^  O  ^pf’  ^  ^  v.  ^  j  # 

^  t-p  K  UA-J^jt-*—  IptocU^  k~  (2-<J_  b\Tcr-'£&ip-Gi f^,  <  t/s^Ay 

4  t<-‘iu'~ 

4 


frr- 


Tku^  &-K  Ck-^-^u 

)LtSVo U  t  fplocL  O a-v^A^i.  Lc,  a.  ^o-te' 


pkdVo  1 

t^oclc. 

A"  IrLocL  U.*^  O  kJ^T'Caj  c^U-  bLacUi  6rvu^jy?#v  -€— 

Or"  0]p*-  ^u--*6y .  Ti-u  Cj  Sjj-c.c^  ^i-»  ^  c^ebt^louA^i 

tl_<_  L?CacU  c»v.C«-*-»-*  A. , 


2.  1.2-  (S'&u^c  OVi"  Jtr^otlAr^, 


TU  ^  c  Le-y  owv-  jb-w<  turt  £s  cjpt«ofc\^4.  3-ir  cj 
r»T_i'.A  b  <S  t*-«_  sb-w  Urc  w4L  a  O^nVfcXt  kwh-l**<- 

-&/<£y.  Tit  to-U^, 

^£w«^c  (>wt  «A»j^f(:  J, 


•i 


£  bo  (?e 


Co  ^[/aJL 


3 


ZjZ  83>oGe  i  — ^ 


A-ll 


:-  • 


-iz- 

m  oj  £Lu  jfgd^tc  a~X 

«-  l*yep+  tfcjecH  fcU^  cLs*r*f(-^r>U ,' c  j 

j|  4-JL  nJt-«  *-«*->»  tLe.  dtzjscts, 

EExd^jLe4  <$  *4tr\  (0+44,*  Ojtc_  z 

—  C*Lc_fc'Y» i*c«*-i*— -  £L>-  fcy^*,  *j 

<tj«f<:  ^  S^c^ic  /  He.,) 

tAx^-tS^y  feL»-  u.iC/fXtA*(  <fWj ■*£■£  ^ 

—  Stri~c<ru.r«_  (ouir-A^  r^CcX^y.V,  fcU  fc-Lu. 

L;»<4,cU«.^  *-*^*t-c  o-iL;  p  s  /ytw<x^ 

Ani  L6rr<  >  j»  <u_JuL^.e*.  ‘ae^KrfJi—  5jjtP^Vc_  ^t~-A 

<-  tiv£,  Ajc~- ^»pfw-te_>. 

6*yoU'\r  (fcjfcfcs  <u.^  Ca^e*~<r  p^Koui; 

—  position  tfVtr;  ionA-cs/  U-e-  ,U  ^  <-)  .*  e-«~  j  #*.«l 

poittCavo  0j  £<-*-  <!>»)■+ a  J 

—  prjt)tJcc4AO-~  ^^'UU  j/  t- 

koO  fcLc  <-0 l~1(-4~k-  n~)  &I<  L  ft.  1  ft I  ■  r j  ■  V 

<&j-*<-tr.i  O  fco  lc-t  cU-*^C^  ^'<j*  (.Lc~r 

Ia.^.  Sf***^,  ^io4^(>k, 

“1^1*4*  «4^rK  J  ^  £Lc  SfX^Y' *"  rfrj*c4rj 

px.  Aj^\h*.X  cw  2.  1.3.^., 

ZJZ  8zobci  -5" 

V*r>  •'  ei~  J2_ 


A-l . 


2*  1.  3  *  1  tWcri  tou*-e-S 


erGjU  u  A^eiri^Oi^Ae^.  5^  1/1.2.  Scn—e_ 

poi'itx icim’^  p  r-c±£__lc  e-K  lourbes^  CMa~-  b-e-' 

S  6 Co  ^i-c  A.  ^ or  AA«—  <£>jecfc 

ko  JL*«A*  Gi-e^  0~ppty  <&c  <ck-ly  ^  or  (A  L;^Lt*nr 

(juJU  ^  L\**-i^-cLy.  Tu.  C***.y 

AA-C  ^ttr^bu^r-fc^  ff>-t.  t^t-Vjvwe  t'-ccA  /G t^JU*6/l’  J^Cl*-6yi 

■jkr  tL  e_  ^  ou^  /jLS-d,  j.  Tl-*y  c* —  4tC,  lOvt-rcCdeM- 

by  ^k«y\  C-  4^L  S^€cC^4  C.  eW:  c^c  G/-«_  -^o*~}t. 

(jU-tH*  . 


crr^ 


P«r  -C'^c*) — ^OS>  I*!?4  Go  Sf2c-ify  ALc~ 

r-c^o  C*Mo —  Jar  t^ 

\^Lo  cUs  fib  *£*-  ‘\JL*  d~ . 

J  U.  4.  A*  i'A  >  1*''^  S  A  (Li.  (La 

kie  4-  ulL-e —  Ko  p*-/&  cw.-£*w-'  vT/s/v^ei  <?~r~Q-  s  fuz^c  ^ 

*UfikeX  lk,  ALo  ^Coku^L 


1  * 


are 


To  Lie.  CzyOvAr 

(jljJ'Ccfe'^  IzLir  crij-r  -^s  i 

/~  £U  Aj<cK 

dOuc^-rt-vc  A-  0\f**tri^X-  Om^  d&^euuMz. 


%•  f  fcL^,  Aj<tV-  Cfi— ca-r(~*J-  C.J'nre^SvJL^ 

^°  *•  C'  Ajecfc^  tLe^  qjl u  *-*-»■  c-  i^^bi^-es 

Oiit+rikL.  fa*y  fi^j  OA/JJc}  Sju^JtS  OK,  {4-*-' 

S^e^YlC  itruttru^^. 

(>l\fc.  S^>»«^^iC  (!•  C£* — cje-f-k-e  Jl  cj  ^U, 

^ocu^~4^4-  A'rtn'ti^t'e s  6^-t-  dl»j «rfe 

Mr  fcLc  U<^vt  Li^r  'itoJL  Act-  ^  HaJ*U+//c 

\S*Aa*. ts  ^*-  iv(reo  c^  (^Lt_  Cffl.C*-rh.*<$. 

ho  MrfcW  <»*-#<..*  4Snr-t  S(ieof^\€,Jl  ^AcfJU^Ci,  ty 
4aI4~-  Mr  A-acu/i— «oh  £jtsv~cf ,  kLc  (L^ *u~-€A-  U'+tucS 

<!«-  £/-«>  ^Aoo‘— *— *— fi-p^ty. 
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O2 


#3  060  I  —  r.  2. 


-is- 


?  1.3.1  /4^KbK*«- 

(•1  OK  S 

XL<L  oj  ^7«_  ^fJLc^i C.  #—X-  ^2._  o^*->' <_  o-uAr~ 

«.rC.  /C^I»-l-«^  te^Oul, 

Ssk.«_  Xpjity 

©twXij  tro  c«^fep>w  P&i 

^  Ejects.  tX«-r*. 

C^  tu.  C£.i*./  ife"  i — t — tc <*v.< L  As£ ! •  ok.. 

PiWupjt,  tu.  £*t*is>U- to 

e-tfJty  to  o6»j>flr 

“Tkli  bt-'t-C.  S  jp-e UL-tA4>-r  fct*-'  oL>j*cir 

f 

CoU(^-rt~*.  X  Ci  *-  tfLocu'V*-^  ^  ^  (**£*-  y 

P  (Sr  »^-  c4_  ^  o _ A. 

LjL-^Hv^r  C<r  O  *.  *3^"  ^  ^AU*-r~»’c- 


yd*j^{  i 


«r 


77.0  **i*m*-  Cikju.^£\jto  tU-  (^jfcK 

^u.<X  ti  o4e«S-  to  rJk£JL<~  ^ °  t'L*'  c^CJ>^’u^ . 

60  Cw-fco^OMJU^  ^jB-W^rri  C  <$t,j  •«  f 
T"^-o  <|ijk6;^  to  A.  ^>d  <P-ty 

'€&*■  ^  OU-ty  UL-in^  t  fc  Ljvy~ts>  pcrl-j~\  fco  ^X'W^-'rV  C, 

if.j*c(r.  y_0  tL  »U -K/'.w  ^  eu  c^>r oit.'-j 

y-  (fla>J  4*CkTm 


T/-»-  felta*.  ^  feUo 
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2  <Ps©6  ©  f  — 5^3 


to  $  (}y  id  $£>J  * 


Tl-.i  bu4-£.  Afijfc-to  b<*  ju-\j  tyj^ib^  tW'Ce, jj 

ct-  O  M-  £U-  ^6a<J-eo^  hsJ<£  <S^  CL*.  l~\x-r  «-t-t.Ly 

Zb  C9»—  $ r> b  *.  Ci'j  t  c fU^b'i^'-c^ 

tLc,  bcLt*i0*~b  SbkorAZi^*^*-  tfoje  (l?  S % 


r<~Oo  to  6 -Or  t  Cm-  f  t  S 


Ti>»*  ipl^i-L,  /s. ppL^-t/)  to  ju ^t,J  ectr 

K^-*-  *doujO»  t  f^Jt.  /-« *~r*-rcLy  .  ■l^b' 

Coi~\  *1  £k,€~  CtC« — t:  ^ » *-<■"  cr^  £/£&— t  >  j i 

b|  c^cr-eo^o^U^  t-Orlr  uu^ih  on-  k.*-oS\r 

JU K(ti  £  ^  3^, 


U~t‘&'  t*<*K/  cL 


s  £0  ^ 

i-M- 

dj^AUjeA-- 

Ws  ^  4^ 

) 

V*«  '•>, 

V~f“3x 

^1' *>■*«$-  0- 

&e  (  O  >■-<-  a.  *i- 

Tt.--i  b~t  <_  ^  f‘&- -~y 

U.  Sfe~ft(l  Ai  A.  Jjf^i-  H-  L-.jUr  ^V«4). 
It  l-|n  h  *_  tf-*-i*vJ.  j;zt  e 

XSO  ^ t  1  Xfi/jM-A-  C*-  (t-**>  < OK-  JLe*n+>~.  , 

*  1 3*2-  8iobc1 -S.*/ 
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“TLc>  ^A(ic<ri  6©  /v  ^TWK-C  Gr-  X  0t-4 
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4  Document  architecture  * .  v 

^.1  Specific  document  architecture 

f .1.1  The  architecture  that  ia  particular  to  a  given  document  is 
called  specific  document  architecture.  It  consists  of  the 
following  components  (see  Figure  4) : 

-  content 

-  layout  structure  1  specific 

-  logical  structure  J  document  structure 

-  layout  directives 

^.1.2  For  the  presentation  on  paper  or  screen  the  content  of  a 

document  is  physically  structured  into  pages,  blocks,  lines, 
character  boxes  etc..  This  structure  is  called  layout  struc¬ 
ture  and  the  objects  building  up  this  structure  are  called 


Figure  4.  Document  Architecture  Model  < 
T  Structure.  Functions 
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layout  objects 


In  the  final  form  the 


content  la  divided  into  portions  which  belong  to  this  layout 
objects. 

.4.1.3  On  basis  of  the  final  form  only  such  processing  can  be  done 
efficiently  which  causes  no  reformatting  in  the  environment 
of  the  manipulated  layout  object'.  Zt  may  comprise  layout 
revisions  like  to  scale  and  to  move  blocks  within  empty 
space  or  to  overlay  them  transparently  or  opaquely.  In  cases 
where  the  environment  needs  to  be  reformatted ,  this  has  to 
be  done  manually  by  the  user. 

.4. 1.4  Documents  use  to  be  logically  structured  in  order  to  enhance 
the  comprehension  of  the  text.  Zn  the  final  form  of  documents 
the  logical  structure  may  be  implicitly  expressed  by  the  lay¬ 
out  of  the  content,  i.e.  its  arrangement  within  pages  and 
its  type  style. 

In  the  revisable  form  the  logical  structure  of  a  document 
is  explicitly  represented  by  logical  objects,  e.g.  like  sec¬ 
tions,  paragraphs,  footnotes,  figures  etc..  The  content  is 
divided  into  portions,  which  belong  to  the  logical  objects. 
The  logical  structure  can  be  edited  (revised)  by  commands 
like  "insert,  delete,  move”  etc.  applied  to  logical  objects 
like  a  paragraph.  Figure  mhowe  an  example  of  a  specific 
logical  structure. 

.4.1.6  Zn  the  revisable  fora  layout  directives  can  be  associated 

as  attributes  to  the  logical  objects.  These  layout  directives 
allow  for  the  control  of  an  automatic  formatting  and  layout 
of  the  content  portions  belonging  to  the  logical  objects 
during  editing  (revision) .  Such  layout  directives  may  be 
"centered,  left  aligned,  two  columned”  etc.  and  "emboldened, 
underlined,  italic"  etc.  applied  to  paragraphs,  sections  etc.. 
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Figure  ^  Example  of  a  specific  logical  structure  of 
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Zn  a  given  docuaent  common  content  portions  may  occur  se¬ 
veral  tines  like  a  logo  on  several  pages  or  a  standard  pa¬ 
ragraph  within  several  sections.  Por  purpose  of  transmis¬ 
sion  efficieny  the  common  content  portions  need  to  be  trans¬ 
mitted  only  once  in  that  part  of  the  document  interchange 
format,  which  contains  the  generic  document  structure.  In 
the  specific  docuaent  structure  there  need  to  be  only  re¬ 
ferences  to  that  coamon  content  portions. 

4.2.4  The  generic  layout  structure  defines  layout  templates  for 

pages  containing  the  position  of  predefined  blocks  with  com¬ 
mon  content  (e.g.  logos)  and  of  "frames"  (e.g.the  image  area, 
an  address  area)  within  which  the  content  of  the  logical 
objects  aay  be  dynamically  formatted.  Such  a  template  might 
represent  a  standardized  fora  like  ISO  3535.  Figure  3  shows 
the  exaaple  of  a  template  of  a  fora. 


The  generic  layout  structure  allows  also  for  predefined  se¬ 
quences  of  pages  with  predefined  layout,  e.g.  a  template  for 
the  "cover  page"  followed  by  a  template  for  the  "introduction" 
page  and  a  template  for  all  "new  section  pages"  of  a  certain 
docuaent  class  v 

4.2.5  The  generic  logical  structure  allows  for  the  definition  of 
types  of  logical  objects,  named  generic  logical  objects, 
which  are  characteristic  for  the  docuaent  class,  and  of  the 
definition  of  their  hierarchical  order  and  their  possible 
sequential  order.  The  hierarchical  order  is  expressed  by 
a  "consist  of"  relation,  e.g.  for  a  section  consisting  of 
none,  or  one  or  aore  subsections,  which  may  consist  of  one 
or  aore  paragraphs.  The  sequential  order  is  expressed  by 
a  "followed  by"  relation,  e.g.  a  head  followed  by  none  or 
one  abstract,  followed  by  one  or  more  sections,  followed 
by  none  or  one  reference  list.  The  generic  logical  struc¬ 
ture  can  be  regarded  as  a  set  of  rules  (i.e.  a  grammar)  from 
which  specific  logical  structures  can  be  derived. 

Figure  shows  an  example  of  a  generic  logical  structure. 


A-32 


*,»  v  V 


A  A  e\  V 


*  L*  mjTm  ‘  O  •  *  •  *  A  »  v*  O  •  "  a 


-3$- 


The  formatting  and  layout  process  creates  or  modifies  the 
layout  structure  and  results  in  layout  objects  like  blocks 
arranged  and  styled  according  to  the  layout  directives. 

,3.2  Generic  document  architecture 

.4.2.I  A  given  document  can  be  regarded  as  a  member  of  a  document 
class  like  a  bussiness  letter,  report,  purchase  order  etc. 

The  generic  document  architecture  provides  the  user  with 
means  to  define  rules  for  the  logical  structure  and  templates 
for  the  layout  which  are  characteristic  for  a  given  document 
class.  A  document  class  is  defined  by  the  application.  It 
is  not  intended  to  standardize  any  document  class  by  Recom¬ 
mendation  S.a. 

4.2.2  The  specific  architecture  of  a  given  document  can  be  built 
according  to  the  rules  and  templates  of  its  document  class. 
They  are  described  by  the  generic  document  architecture  of 
the  document  class  which  consists  of  the  following  components 
(see  rigurel  ) j 

-  common  content  portions 

-  generic  layout  structure  *)  generic 

-  generic  logical  structure  J  document  structure 

-  generic  layout  diretcives. 


The  interchanged  generic  document  structures  help 

-  to  improve  the  transmission  efficiency, 

-  to  maintain  the  consistency  of  the  document  with  the 
document  class  during  revision  at  the  recipient  and 

-  to  facilitate  the  creation  of  new  documents  of  a  certain 
class. 

4.2.3  The  common  content  portions  of  a  document  class  are  prede¬ 
fir  *d  port*'  is  of  text  like  the  geometric  elements  of  logos 
in  ..»s,  ihe  character  box  elements  of  standard  paragraphs 
in  authority  documents  etc.  which  are  common  for  all  specific 
documents  of  that  class. 
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gurr*  V*  Example  of  a  generic  logical  structure  of  a  document  class  named  "report 


^.2.6  The  generic  layout  directives  are  attributes  of  the  generic 
logical  objects  and  apply  to  all  specific  logical  objects 
of  the  same  type.  Similar  to  the  specific  layout  directi- 
ves,  which  are  associated  only  to  single  logical  objects 
the  generic  layout  directives  allow  for  the  control  of  an 
automatic  layout  of  the  content  of  logical  objects  on  the 
presentation  media.  There  are  twqo  types  of  layout  direc¬ 
tives,  the  one  effecting  the  positioning  and  the  other  ef¬ 
fecting  the  type  style  of  logical  objects.  Specific  layout 
directives  may  overwrite  generic  layout  directives.  If  a 
given  document  has  been  changed  by  this  way  it  is  no  more 
a  member  of  that  class  from  which  it  originally  has  been 
derived.  Figure  S'  demonstrates  the  functions  of  the  generic 
structures. 
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Figure  37  Functions  of  the  generic  structures 


%.l  Categories  of  interchange  formats 


By  apropriate  selection  among  the  components  of  the  docu¬ 
ment  architecture  different  interchange  formats  with  dif¬ 
ferent  capabilities  can  be  derived. 

Two  major  categories  of  interchange  formatB  are  distingui¬ 
shed 

-  text  imaging  formats  (TIP) 

-  text  processing  formats  (TPF) 

.£.2  Text  imaging  formats 

£.2.1  Text  Imaging  Formats  mainly  support  the  imaging  (printing, 
displaying)  of  documents  at  the  recipient.  The  document  is 
interchanged  as  an  original  in  the  final  form  by  interchan¬ 
ging  its  content  and  its  layout  structure.  The  content  and 
layout  structure  of  such  received  documents  can  be  edited 
(revised) ,  however  only  by  doing  any  reformatting  manually. 

The  text  imaging  format  offers  no  support  for  an  automatic 
reformatting . 

There  are  two  text  imaging  formats.  TXF.l  (Basic  TIP)  and 
TXF.2  (see  Figure  4) . 


.£.2.2  The  test  imaging  format  TXF.l  is  called  Basic  TIF.  Xt  con¬ 
tains  the  content  structured  by  the  objects  "pages”,  "frames” 
and  "blocks"  of  the  specific  layout  structure 

The  Basic  TIF  represents  the  final  fora  of  a  formatted 
document  and  allows  for  an  exact  reproduction  of  its  image 
at  the  recipient  as  created  by  the  originator.  There  is  at 
least  one  frame,  which  represents  the  image  area  (©  printable 
area) . 
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For  the  easy  repositioning  of  logically  related  blocks  which 
need  to  retain  their  relative  positions  during  layout  revi¬ 
sion  at  the  recipient ,  there  night  be  additional  frames  which 
enclose  these  blocks.  Such  a  frame  could  be  one  enclosing 
a  diagram  block  and  several  caption  blocks  of  a  figure.  How¬ 
ever  these  frames  need  not  necessarily  be  interchanged. 

e 

Note:  In  the  context  of  TFF  frames  have  an  additional  essen¬ 
tial  function.  They  define  boundaries  within  which 
the  content  of  the  objects  of  the  logical  structure 
can  be  automatically  formatted.  Therefore  layout  di¬ 
rectives  which  effect  the  positioning  refer  to  frames. 

J.2.3  The  text  imaging  format  TIF.2  adds  to  the  objects  of  the 
Basic  TIF  the  objects  of  the  generic 

layout  structure.  . 

It  allows  for  transmission  efficiency.  Predefined  content  of 
layout  objects  which  occur  repetetively  on  different  pages, 
like  consonants  of  forms,  are  transmitted  only  once  within 
the  generic  part  of  the  text  imaging  format.  In  cases  where 
the  document  class  is  known  by  the  recipient,  e.g.  within 
a  company  or  for  standard  forms  such  as  e.g.  ISO  3535,  TIF.2 
contains  only  the  name  of  the  document  class  within  the  ge¬ 
neric  part  of  the  format  and  the  detailed  generic  informa¬ 
tion  is  added  by  the  recipient. 

£.3  Text  Processing  Formats 

1.3.1  Text  Processing  Formats  support  the  processing  of  documents 
at  the  recipient.  The  document  is  interchanged  in  the  revis- 
able  form  by  interchanging  its  content,  the  logical  struc¬ 
ture  and  the  layout  directives.  The  content,  the  logical 
structure  and  the  layout  structure  can  be  edited  at  the  re¬ 
cipient  supported  by  automatic  reformatting. 
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There  are  three  text  processing  formats,  TPF.l  (Basic  TPF) , 
TPF.2  and  TPF. 3  (see  Figure  6) 


£.3.2  The  text  processing  format  TPF.l  is  called  Basic  TPF.  It 

contains  the  content  structured  by  the  objects  of  the  speci¬ 
fic  logical  structure,  the  specific  layout  directives  and 
the  objects  of  the  generic  layout  structure.  In  the  case 
of  TPF.l  the  positioning  layout  directives  refer  to  the 
generic  frames  of  the  eneric  layout  structure  (e.g.  the  image 
area) . 

.The  TPF.l  allows  for  transmission  efficiency  in  saving  the 
interchange  of  a  not  yet  final  layout.  The  unformatted  do¬ 
cument  can  be  formatted  at  the  recipient  according  to  the 
specific  layout  directives  given  by  the  originator  which 
may  be  accomplished  and  revised  by  the  recipient.  The  con¬ 
tent,  the  logical  structure  and  the  layout  struture  of  the 
document  can  be  edited  together  with  an  automatic  formatting 
according  to  the  layout  directives. 

£3.3  The  text  processing  format  TPF.2  adds  to  TPF.l  the  rules 

and  objects  of  the  generic  logical  structure  and  the  generic 
layout  directives. 

The  TPF.2  allows  for  the  interchange  of  unformatted  docu- 
ments  which  can  be  formatted  at  the  recipient  according  to 
the  generic  layout  directives  given  by  the  document  class 
and  the  specific  layout  directives  explicitly  defined  by 
the  originator.  The  document  can  be  revised  under  control 
by  the  generic  logical  structure  which  helps  to  maintain 
the  consistency  with  the  properties  of  the  document  class. 

The  types  of  logical  objects  which  may  occur  in  the  speci¬ 
fic  logical  structure  and  their  possible  hierarchical  and 
sequential  orders  are  defined.  In  order  to  format  the  docu¬ 
ment,  layout  directives  have  to  be  added  to  the  logical  ob- 


jects  by  the  recipient.  Zn  cases  where  the  document  class 
is  known  at  the  recipient  the  generic  part  o£  the  format 
contains  only  the  name  of  the  document  class  and  the  detailed 
generic  information  is  added  by  the  recipient. 

£.3.4  The  text  processing  format  TPF.3  adds  to  the  objects  and 
layout  directives  of  TPF.2  the  objects  of  the  specific 
layout  structure. 

The  TPF.3  allows  for  the  interchange  of  an  already  format¬ 
ted  but  still  fully  revisable  document.  It  adds  to  the  pro¬ 
cessing  capabilities  offered  by  the  TPF.2  the  capability 
of  reproducing  the  image  at  the  recipient  exactly  identical 
to  the  one  originated  by  the  sender.  This  format  may  be  named 
"full  text  format". 
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PRATT  t  aL:  COMBINED  SYMBOL  MATCHING  FACSIMILE 

—  -  L  Introduction 

MOST  facafanilo  coding  lystenu  previously  developed 
have  been  baaed  on  the  concept  of  run-length  coding 
[1].  Run-length  nntint  tn  “thirds  pr-ifir  a  mlatimiT 
high  rosnpr— ion  ratio  for  a  graphics  type  of  document  or  an 
alphanumeric  document  awnsaiwhig  a  email  amount  of  text 
(21.  But,  the  achievable  compression  ratio  drops  appreciably 
tf  a  document  is  filled  densely  with  alphanumeric  characters 
because  rim  black  and  whiM  run -lengths  become  quite  short. 
Dense  alphanumeric  documents  can  be  efficiently  coded  by 
symbol  recognition  in  which  individual  symbols  are 

detected  and  coded  by  a  prototype  Ufacary  code  (31,  (41. 
However,  such  a  method  cannot  effectively  handle  documents 
containing  a  mixture  of  alphanumeric  and  graphics.  One 
proposed  approach  to  this  problem  has  been  to  segment  a 
document  into  stripe  containing  alphanumeric  text  or  graphics 
dam,  and  than  cods  the  former  by  symbol  matching  and  ths 
latter  by  nia  length  (SI.  The  problems  with  this 

approach  are  the  difficulty  of  document  segmentation  sad  the 

accurate.  This  paper  introduces  a  aaw  concept  of  hybrid 
symbol-matching/nm-tangth  coding  in  which  a  document  is 
dynamically  sagmawtad  into  symbol  and  graphics  regions  (6) . 

Conceptually,  the  symbol  versus  graphics  segmentation  pro¬ 
cess  employed  la  the  facsimile  oompremor  is  quite  simple.  A 
Jamment  is  scanned  line  by  line,  and  all  Moisted  symbols  that 
are  expected  to  recur  in  the  document  are  extracted  and  coded 
by  a  symbol-matching  procam.  The  remain dar  of  the  docu¬ 
ment,  called  the  wridua,  is  coded  by  two-dimensionei  rim- 

gyg^oii  to  to  ocxM  by  gyatol  i"tTrffrtnct 

symbols  from  ttot  portto  of  tto  doemst  which  is  no* 
length  coded.  The  reroM  Is  an  efficient  metch  between  the 
type  of  data  smd  the  rh aeen  nniHng  metheds. 

The  symbol-mntching  proesm  previously  described  hes  been 
adapted  to  rarngnirt  slphenumeric  chntnnnrs  in  a  document, 
in  this  symbol  moopfahw  mode  of  operarion,  the  docnaamst 
is  riprssentsd  by  conventional  priatar  codes:  character,  apnea. 
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along  with  the  location  coordinates  of  the  symbol.  If  the 
comparison  is  unsuccessful,  the  new  symbol  Is  both  trans¬ 
mitted  and  pieced  in  the  library.  Those  areas  of  a  document 
in  which  the  blocker  cannot  isolate  s  valid  lymbot  ate  ssrigwsd 
to  a  rasidus.  and  a  two-dim ensirmsl  nm-tangth  coding  tech¬ 
nique  is  uasd  to  ooda  the  residue  data.  Ths  fallowing  roetioas 
describe  key  elements  of  the  coder  in  venter  detail. 


far  both  the  facsimfls  and  symbol  rec- 
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fas  CBM  ending  syssam  far  faahmfie  raiding.  In  operation,  a 

sewage  chanemr  hMfat  an  stand  in  a  scrolled  huger.  This 
dam  is  than  exanriswd  has  by  Bae  to  determine  if  s  Mack  pixel 
extern.  If  the  wtthe  haw  oentaias  no  Mock  pixel,  rids  hriorme- 
ttoo  Is  sneodsd  by  m  snd-of  Man  code.  If  a  Mack  pixel  aha. 
n  bio  slang  process  is  ami  darted  to  Isolate  tbs  symbol  For 
those  halalad  symbols,  farther  proesmiag  b  mquirad  to 

ra edy  enhts  hi  the  Bhnry.  Thh  pro  sms  Invalvaa  fas  extrac¬ 
tion  of  a  sac  of  features,  a  sereaadag  operation  to  refect  us- 
peemWag  candidness,  sad  fiaaBy  a  sarias  of  template  mntehss 
The  fast  Maefead  famaanr  mi  tta  feature  vector  aro  ahrays 


A.  Symbol  Blocking 

The  function  of  the  symbol  Mocker  is  to  examine  the  input 
buffer  in  a  systematic  fashion,  and  to  locate  the  poeition  and 
sin  of  any  isolated  characters.  Fig.  2  illustrates  the  Mocking 
proesm.  A  Mack  pixel  in  the  buffer,  denoted  by  the  character 
"I”  h  conridered  to  he  a  fay  pixel  whan  sear  ths  four  naigh- 
bess  located  shoes  it  and  to  its  left  an  white,  as  shown  baiow 


Whenever  a  key  pixel  h  encountered,  the  Mocker  h  initiated. 
The  Mocker  extracts  those  pixels  from  the  buffer  that  arc 
contiguous  with  ths  key  pixel,  or  andoced  by  a  set  of  contig¬ 
uous  Mack  pixels.  For  example,  with  the  lowmr  case  letter 
“a,"  al  Mack  pixels  and  the  enclosed  white  “Maud”  win  be 
extracted  by  the  Mocker. 

A  Feeiurt  Extraction 

The  most  straightforward  method  to  determine  whether  a 
match  exists  between  sa  unknown  symbol  and  one  of  tbs 
symbols  stand  in  the  library  is  to  perform  a  template  match 
faCWBsn  the  unknown  and  every  library  symbol  However,  a 
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nctnct  a  sat  of  scalar  “features"  from  the  various  symbols 
he  library.  These  (estates  are  used  to  reduce  or  “screen” 
number  of  candidates  for  a  templets  match  to  a  tiny 
fraction  of  all  the  pomibiKtisa  in  Dm  library. 

The  features  used  in  the  screening  process  are  the  block 
heigkt,  block  width,  symbol  perimeter,  and  pixel  area  enclosed 
by  the  outer  boundary  of  rim  symbol.  Fig.  3  provides  an 
example  of  features  derived  from  a  character. 


C  GsndMsar  Jerwsitihgr 

The  purpose  of  the  sereesdaf  proeeas  is  to  reduce  the  burden 
on  the  template  mafrhar  by  pasting  only  “good  prospects”  to 
the  matches.  This  la  acoorupMshed  by  calculating  the  feature 
specs  distance  between  the  unknown  and  each  Unary  entry, 
and  aataciing  the  library  candidate  with  tim  smallest  distance 
ns  the  beat  prospect  tor  a  match.  If  this  match  is  rejected,  the 
ant  bam  candidate  is  considered,  and  so  forth.  The  distance 
“metric"  naad  to  derm  mins  how  “does"  an  unknown  it  to  a 
pastjeator  candidate  is  the  “city  block"  distance  defined  by 


D(U,C)mj£\rc<J)-FtrV)\ 


where  Fgi/)  is  the  Ah  feature  of  the  candidate,  Foil)  la  rim 

Ah  Cantata  of  the  unknown,  |*|  denotes  the  absolute  value, 
DiU.C)  is  the  durance  between  the  unknown  and  candidate, 
and  Nr  in  the  number  of  Cantons. 
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of  s  detected  symbol  and  *  Unary  prototype  symbol. 


C-t, V".*c  ®d  A-l,  V.N*.  A 


d(C,A) 


tween  a  pnk  of  vector  patterns  d(C,  A)  and  MC,  A)  by  sum¬ 
ming  dm  number  at  picture  sismsnts  (pixels)  for  which 
d(C,A)  end  J(C,  A)  dtffar.  This  fitCLUKVE  on  error  la  do- 


“It  MC,  A)#*C,A) 


A  major  shortcoming  of  the  conventional  template  matcher 
described  above  is  that  it  treats  all  errors  alike  regardless  of 
where  they  occur  spatially.  An  improved  matcher,  to  be 
described,  utilizes  a  “weighted  exclusive  or”  error  criterion 
that  is  based  on  the  context  in  which  the  error  occurs. 

The  motivation  behind  the  weighted  EXCLUSIVE  OR  count 
error  criterion  may  be  appreciated  by  examining  the  EXCLU¬ 
SIVE  OR  error  (denoted  A  ©  B)  in  Figs.  4  and  S.  Compare  the 
EXCLUSIVE  .or  pattern  for  the  “c”  and  “o"  in  Fig.  4  with  the 
pattern  far  the  pair  of  “e*s”  in  Fig.  5.  Note  that  the  exclu¬ 
sive  OR  error  count  for  the  pair  “c”  and  “o”  (count  “  23)  is 
actually  lux  than  that  for  the  pair  of  “e’s”  (count  *  29)  im¬ 
plying  that,  by  this  error  metric,  “c”  and  “o”  axe  “closer” 
than  the  pair  of  “e’s”  are  to  each  other.  However,  the  error 
pattern  for  the  pair  of  “e’s,”  which  should  be  declared  a 
match .  is  oompoeed  of  sparsely  distributed  pixels,  while  the 
am  pattern  tor  the  “o”  and  “c”  shows  a  dense  node  of 
error  pixels  corresponding  to  the  miming  right  segment  of  the 
“o.”  One  way  to  quantify  the  density  of  such  a  “node”  is 
to  form  a  summation  in  which  the  ‘local  density”  of  every 
black  pixel  is  merely  the  sum  of  all  the  pixels  in  its  3  X  3 
neighborhood  if  the  pixel  is  1,  sad  0  if  the  pixel  is  0.  The 
patterns  above  labeled  “weighted  XOR  error”  have  bean 
salfOlafed  in  this  manner.  Note  that  by  this  criterion,  the 

separated  (count  ■  1 3 1)  than  are  the  pair  of  “e’s”  (count  *73). 

la  the  template  number,  the  weighted  exclusive  or  error 
is  computed  far  atos  translation  shifts  of  a  pair  of  patterns 
ootsuapondlag  to  horizontal  and  vertical  tingle  pixel  shifts  of 
the  patterns.  The  «*»*■■«»  error  is  than  compared  to  a 
threshold  in  order  to  determine  whether  or  not  a  match 
should  be  declared.  The  value  of  the  threshold  is  a  non- 
Maaar  function  of  the  symbol’s  Mack  count,  and  is  obtained 
by  aa  taudrhnirr  determined  look-up  table. 


£  SMrtry  Memtenencr 

A  fixed  stae  library  is  used  in  the  CSM  system.  The  first 
Mocked  character  and  its  feature  vector  occupy  the  first 
library  slot.  The  lubsaqusnt  library  atom  ate  occupied  by 
times  Mocked  chatactam  tor  which  no  match  is  found,  la 
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Fit.  4.  Partial  notary  plot  of  CCITT  no.  4. 
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Fip.  6  and  7  contain  partial  library  plots  of  isolated  symbols 
from  two  faraunile  documents,  one  a  French  journal  article 
(CCITT  #4),  and  the  other  a  Japan  sea  langaaw  document 
(CCITT  #7).  The  fiat  item  on  the  list  is  the  fist  isolated  pro¬ 
totype  symbol,  and  all  lotto  wins  symbols  represent  matches 
to  the  prototype. 
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f-  Prototype  Coding 

Aft er  a  symbol  has  been  Mocked,  a  decision  tHHI  is 
applied  to  each  prototype  element  of  the  library  that  has 
passed  the  screening  test.  If  a  match  is  indicate  oaty  the 
"ntcfafng  library  ID  and  horizontal  location  with  respect  to 
the  previous  symbol  ate  coded.  Otherwise,  the  binary  pattern 
of  the  blocked  symbol  is  transmitted  along  with  the  symbol 
width,  symbol  height,  and  horizontal  hi  addition  to 

being  placed  in  the  tthtzry  sea  new  prototype  Moment. 

The  simplest  method  of  prototype  coding  is  to  binary  code 
the  pixels  within  a  Mock  in  a  raster  scan  *—»■*"■»  On  the 
avenge,  about  30  percent  of  the  prototype  code  bits  am  be 
eliminated  by  scanning  the  prototype  pixels  in  a  folded 
“basket  weave”  sequence  and  applying  one-dimensioaal 
Huffman  coding  of  the  run  u«g»h»  The  disadvantages  of  this 
approach  are  additional  impie™—— rion  complexity  — ^  poo- 
dMelom  of  bitstream  syndumdsation  whan  a  nor 

occurs.  The  binary  coding  approach  has  been  adopted  for  a 
“*JH>«rfarmance  version  of  the  CSM  facebufle  cote,  and  the 
taUod  run-length  coding  method  is  road  for  a  very-high- 

.  Ruidne  Coding 

In  many  documents,  than  a re  black  pixel  pattern  that  do 
not  meet  the  criteria  of  prototype  nhaiammi  Bxampiea  in¬ 


clude  exceptionally  large  or  exceptionally  «m«n  alphanumeric 
channels,  segments  of  company  logos,  and  segments  of 
handwritten  script.  In  the  CSM  system,  these  patterns  are 
rejected  by  the  symbol  blocker,  and  then  left  behind  as  a 
residue  to  be  coded  by  run-length  coding. 

R*-  8  presents  a  blow  op  of  a  section  of  a  docu¬ 

ment  (CCTTT  #4)  and  its  comeponding  residue. 

Conceptually,  the  CSM  system  could  employ  any  type  of 
rim-length  coding  method  for  residue  coding.  The  selection 
should  be  made  on  the  basis  of  coding  performance,  tolerance 
to  channel  errors,  implementation  complexity,  and  com¬ 
patibility  with  ftcahnile  standard*.  Considering  these  factors, 
a  modified  version  of  the  CCITT  two-dimensional  run -length 
coding  algorithm  has  been  selected  for  the  residue  coder.  By 
inhibiting  the  symbol  matching  process,  the  CSM  coder  will 
automatically  revert  to  a  pure  residue  coder,  which  can  be 
nude  exactly  compatible  with  the  CCITT  standard. 

&  TrmumMon  Code 

The  CSM  facsimile  coding  system  produces  an  asynchronous 
coda  that  is  dependent  upon  the  contents  of  the  document  to 
b*  coded.  Table  I  contains  a  detailed  specification  of  the  code 
akmauts  and  Fig.  9  coetains  a  state  diagram  defining  the  code. 
The  code  words  lengths  in  this  specification  have  been  opti¬ 
mized  for  a  scan  tnaohnkm  of  8  X  3  pixats/utm. 
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TABLE  I 

CL1  Comiimo  Symsol  Matching  Facsimile  Coot 


L  Extension t  of  CSM  Concept 
la  %  typical  busdaeae  letter  scanned  at  8  X  8  pixels/mm, 
about  40  parcaat  of  tha  compremed  coda  bits  an  devoted  to 
tba  ttaaaariartoa  of  piototypa  symbols.  Almost  all  of  this 
poatloa  of  tha  transmission  coda  caa  ba  sjfaatastsd  if  tba 
dornmsats  to  ba  trananrittad  an  natdctad  to  a  (toad  sat  of 
symbols,  for  arsrapta,  Coarisr  typaaUtar  foot.  la  this  ossa, 

aa  ba  pnatond  with 


symbols.  Isolated  unknown  symbols  detected  in  the  key 
pixel  scanning  process  that  do  not  match  a  library  entry  can 
be  placed  in  the  residue  for  subsequent  run -length  coding. 

The  symbol  matching  process  in  the  CSM  system  is  not 
exact;  a  match  tolerance  is  permitted  between  symbols  to 
accomodate  perturbations  in  symbol  shape  caused  by  the 
manning  prow.  As  a  consequence,  ia  the  basic  CSM  system, 
a  reconstructed  document  is  not  aa  exact  pixel-by-pixal  replica 
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of  the  original  document.  Although  symbol  substitution 
errors  are  extremely  rue,  there  may  be  applications  in  which 
exact  coding  is  demanded.  This  mode  of  operation  can  be 
accomodated  in  the  CSM  system  by  a  simple  modification  of 
the  coder  and  decoder.  At  the  coder,  after  a  successful  match, 
the  EXCLUSIVE  OR  between  the  pair  of  matched  symbols  is 
formed  and  placed  in  the  residue  for  subsequent  run -length 
coding.  At  the  decoder,  the  pixel  arrays  generated  from  re¬ 
constructed  symbols  and  reconstructed  residue  are  combined 
in  an  EXCLUSIVE  OR  fashion  to  correct  for  differences  in  the 
pair  of  matched  symbols.  In  this  manner,  exact  reproduction 
is  achieved.  However,  the  “overhead”  associated  with  the 
exact  reproduction  mode  of  operation  can  reduce  the  achiev¬ 
able  compression  ratio  by  as  much  as  SO  percent  at  8  X  8 
pixel /mm  resolution. 

UL  Symbol  Recognition  Mode 

The  CSM  algorithm  achieves  facsimile  data  compression  by 
tiie  matching  of  document  symbols  against  a  library  of  *"*n- 
bois  accumulated  during  the  document  scan.  If  a  match  oc¬ 
curs,  the  library  index  is  transmitted  rather  than  the  symbol 
binary  pattern.  This  basic  concept  can  be  extended  to  per¬ 
form  symbol  recognition  by  preloading  the  library  with  the 
binary  symbol  patterns  of  a  predetermined  set  of  symbol 
fonts.  The  coder  can  then  operate  in  a  symbol  recognition 
mode  in  which  only  the  ASCII  codes  are  transmitted  and  all 
other  document  data  such  as  a  signature  or  logo  are  ignored. 

A.  Line  Tracking 

In  the  western  world,  printed  matter  it  “read”  from  left  to 
right  and  Grom  top  to  bottom.  Therefore,  a  symbol  blocking 
system  that  transmitt  its  output  to  a  serial  ASCII  terminal 
must  do  the  same.  However,  the  CSM  algorithm  extracts 
characters  from  the  document  being  scanned  in  a  totally  dif¬ 
ferent  fashion.  As  the  line  buffer  scrolls  through  the  page 
from  top  to  bottom,  the  tallest  of  first  encountered  charac¬ 
ters  are  removed  from  the  document  and  processed  through 
the  recognition  algorithm.  Thus  characters  emerge  from  the 
CSM  process  in  a  sequence  which  would  be  totally  incom¬ 
prehensible  if  viewed  in  chronological  sequence.  In  the  con¬ 
ventional  CSM  facsimile  transmission  mode,  this  is  of  no  con¬ 
sequence,  since  characters  axe  placed  in  their  appropriate 
addrem  locations  regardlem  of  their  order  of  occurrence.  In 
the  serial  symbol  recognition  mode,  the  transmitter  will  as¬ 
sign  each  diameter  an  ASCII  code,  assemble  the  codes  into 
lines,  inserting  blanks,  line  feeds,  carriage  returns,  etc.,  and 
transmit  the  lines  serially  to  the  receiver.  For  single  spaced  or 
rotated  documents,  this  “line  tracking”  is  more  difficult  than 
one  would  imagine.  The  problem  is  basically  that  of  grouping 
the  characters  into  lines.  Determining  the  sequence  in  which 
they  should  be  transmitted  is  relatively  easy  since  the  charac¬ 
ters  may  be  sorted  by  their  column  addresses.  A 
benefit  of  tide  serial  ASCII  mode  is  that  no  information  on 
character  location  need  be  transmitted,  since  the  correct  se¬ 
quence  is  all  that  is  required  in  order  to  properly  reconstruct 
the  received  document. 

The  line-tracking  algorithm  is  based  on  a  straight  line  fit  of 
the  key  pixel  coordinates  of  characters  on  a  text  line,  as  il¬ 
lustrated  in  Fig.  10.  The  straight  Une  is  defined  parametrically 

as 

RmS‘C  +  0  (3) 


Fig.  10.  Lin*  tracking. 


where  R  represents  the  row  index,  C  is  the  column  index,  S 
denotes  the  text  line  slope,  and  O  is  its  offset.  As  characters 
are  encountered,  they  are  assigned  to  the  nearest  straight  line 
representing  a  text  line.  The  algorithm  is  as  follows: 

1 )  The  coordinates  ( C,R )  of  the  first  encountered  character 
ere  used  as  a  “seed”  to  start  a  cluster  at  S  **  0,  O  ■  R. 

2)  The  (C,  R)  coordinates  of  the  next  character  encountered 
are  used  to  compute  E  *  [A  -  S  *  C] 3  for  the  slope  and  offset 
of  each  cluster. 

3)  If  the  error  is  less  than  a  threshold  for  a  given  cluster, 
the  character  is  assigned  to  that  cluster  (next  line).  If  it  is 
greater  than  the  threshold  for  all  dusters,  the  oldest  cluster 
is  dumped,  and  a  new  cluster  is  started. 

4)  If  the  character  was  added  to  an  existing  duster,  the 
values  of  slope  and  offset  are  updated  by  use  of  minim um- 
mean-square  error  techniques. 

B.  Handling  of  Special  Characters 

A  number  of  characters  which  consist  of  two  '‘subcharacters” 
must  be  treated  as  special  cases  in  the  symbol-recognition 
mode.  This  is  because  the  blocker/matcher  would  otherwise 
fragment  them  into  their  constituent  parts  and  give  misleading 
results.  These  characters  are:  (i),  (j),  (!),  (?),  (:),  (;),  (*),  and 
(*).  After  recognition  of  the  two  parts  of  the  character,  the 
system  will  check  if  two  compatible  symbols  are  on  top  or 
almost  on  top  of  each  other.  If  so,  the  two  symbols  are 
merged  into  one.  For  example  two  (-)’s  on  top  of  each  other 
will  be  merged  into  a  (:). 

IV.  Compression  Ratio  Evaluation 

The  CSM  system  has  been  extensively  evaluated  by  com¬ 
puter  simulation  to  optimize  its  performance  and  to  determine 
its  compression  ratio  with  respect  to  other  coding  methods. 

A.  Facsimile  Mode  Evaluation 

The  CCTI 1  document  set  of  eight  digitized  documents  of 
200  X  200  line/in  (8X8  pixels/mm)  resolution,  shown  in 
Fig.  1 1,  has  been  used  for  evaluation  of  the  CSM  system  in  its 
facsimile  mode  of  operation.  Tables  II  and  III  contain  listings 
of  tiie  compression  ratios  for  each  of  the  documents  for  the 
high-performance  and  very-high-performance  versions  of  the 
CSM  algorithm,  respectively.  These  tables  also  contain  the  bit 
allocations  for  each  of  the  code  elements  defined  in  Table  I. 

Table  IV  presents  a  summary  comparison  of  the  compression 
ratios  of  the  high -perform  snee  end  very-high-performance 
CSM  systems  with  several  other  facsimile  coding  methods. 
The  modified  Huffman  code  is  the  CCITT  adopted  standard 
for  one-dimensional  run-length  coding  [2).  The  IBM  code 
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[7],  READ  code  [81,  and  BPO  code  [91  ire  proposals  for  a 
CCTTT  standard  employing  two-dimensional  ran -length  cod¬ 
ing.  These  algorithms  all  provide  for  an  end-of-iine  code.  All 
of  the  algorithms  in  Table  IV  have  been  simulated  and  evalu¬ 
ated  on  the  same  set  of  digitized  documents  scanned  at  the 
University  of  Hannover,  Germany.  The  K  factor  indicates  the'1 
number  of  lines  in  which  the  coder  is  operated  in  its  two- 
dimensional  mode  before  it  reverts  to  a  one-dimensional  mode 
to  limit  the  propagation  of  errors.  s' 

Comparison  of  the  compression  performance  of  these  algo¬ 
rithms  indicates  that  the  CSM  methods  outperform  the  run- 
length  coding  techniques  substantially  for  text -predominate 
documents,  and  perform  at  about  the  same  level  u  the  best 
of  the  two-dimensional  run-length  coding  methods  for  graphics- 
predominate  documents. 


The  compression  factor  obtained  for  this  document  for  opera¬ 
tion  of  the  CSM  system  in  the  symbol  matching  mode  is  about 
257 : 1  and  for  operation  in  the  facsimile  mode  is  about  49:1. 


B.  Symbol  Rtcognition  Mod*  Evaluation 
The  symbol  recognition  mode  system  has  been  tested  with 
86  sets  of  data,  each  containing  1000  samples  of  one  of  the 
86  symbols  of  the  Courier  10  font.  In  these  tests,  no  mis¬ 
matches  occurred,  and  only  very  badly  damaged  characters 
were  injected. 

Fig.  12  contains  sn  example  of  a  business  letter  snd  its  re¬ 
construction  with  the  symbol  matching  coding  mods  of  opera¬ 
tion.  It  should  be  notad  that  the  reooestracted  letter  has  been 
printed  with  t  different  foot  than  the  original,  however,  the 
of  the  two  letters  are  in  baric  agreement. 


V.  System  Implementation 
Although  the  CSM  system  is  more  complex  to  implement 
than  a  conventional  two-dimensional  run-length  coding  sys¬ 
tem,  with  the  advent  of  high-speed  and  relatively  inexpensive 
memory,  discrete  logic  circuits,  snd  microprocessors,  imple¬ 
mentation  complexity  has  ceased  to  be  a  deterrent  to  the 
development  of  high-performance  systems.  A  100  X  100 
lines/in  (4X4  pixel/mm)  facsimile  coder  using  the  CSM 
algorithm  was  introduced  by  Compression  Labs,  Inc.  of 
Cupertino,  CA,  in  Fall  1978.  This  unit  utilizes  a  microproces¬ 
sor  to  implement  the  algorithm  for  transmission  at  sub- 
minute  page  rates.  A  discrete  logic  implementation  of  the 
CSM  algorithm  is  being  developed  by  Compression  Labs  for 
transmission  rates  of  less  than  5  s  for  a  200  X  200  lines/in  page. 


Vi.  Summary 

A  new  high-performance  method  of  facsimile  data  compres¬ 
sion,  called  CSM,  has  been  introduced.  The  coding  system 
involves  segmentation  of  a  document  into  symbols,  that  are 
coded  by  template  matching,  and  into  a  residue  of  the  re¬ 
mainder  of  the  documnt,  that  is  coded  by  two-dimensional 
nm  length  coding.  Computer  evaluation  indicates  that  the 
factor  for  taxt-predommate  documents  is  about 
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twice  that  obtained  with  two-dimensional  rua-iength 
and  about  the  same  for  graptucs-predominate  Janwimm 
Tbe  CSIf  system  can  be  operated  in  a  pun  symbol  recogni¬ 
tion  mode  in  which  a  document  is  coded  by  recognition  of  its 
alphanumeric  symbols.  Compression  ratios  greater  tiw.  250 : 1 
can  be  achieved  on  business  letters  in  this  mode  of  operation. 
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